Boosting Cross-Lingual Knowledge Linking via Concept Annotation
نویسندگان
چکیده
Automatically discovering cross-lingual links (CLs) between wikis can largely enrich the cross-lingual knowledge and facilitate knowledge sharing across different languages. In most existing approaches for cross-lingual knowledge linking, the seed CLs and the inner link structures are two important factors for finding new CLs. When there are insufficient seed CLs and inner links, discovering new CLs becomes a challenging problem. In this paper, we propose an approach that boosts cross-lingual knowledge linking by concept annotation. Given a small number of seed CLs and inner links, our approach first enriches the inner links in wikis by using concept annotation method, and then predicts new CLs with a regression-based learning model. These two steps mutually reinforce each other, and are executed iteratively to find as many CLs as possible. Experimental results on the English and Chinese Wikipedia data show that the concept annotation can effectively improve the quantity and quality of predicted CLs. With 50,000 seed CLs and 30% of the original inner links in Wikipedia, our approach discovered 171,393 more CLs in four runs when using concept annotation.
منابع مشابه
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual
To advance information extraction and question answering technologies toward a more realistic path, the U.S. NIST (National Institute of Standards and Technology) initiated the KBP (Knowledge Base Population) task as one of the TAC (Text Analysis Conference) evaluation tracks. It aims to encourage research in automatic information extraction of named entities from unstructured texts with the ul...
متن کاملRECSA: Resource for Evaluating Cross-lingual Semantic Annotation
In recent years large repositories of structured knowledge (DBpedia, Freebase, YAGO) have become a valuable resource for language technologies, especially for the automatic aggregation of knowledge from textual data. One essential component of language technologies, which leverage such knowledge bases, is the linking of words or phrases in specific text documents with elements from the knowledg...
متن کاملCross-Lingual Question Answering Using Common Semantic Space
With the advent of Big Data concept, a lot of attention has been paid to structuring and giving semantic to this data. Knowledge bases like DBPedia play an important role to achieve this goal. Question answering systems are common approach to address expressivity and usability of information extraction from knowledge bases. Recent researches focused only on monolingual QA systems while cross-li...
متن کاملCross Lingual Entity Linking with Bilingual Topic Model
Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the candidate entity. This paper presents a general framework for doing cross lingu...
متن کاملOverview of the TAC2011 Knowledge Base Population Track
In this paper we give an overview of the Knowledge Base Population (KBP) track at TAC 2011. The main goal of KBP is to promote research in discovering facts about entities and expanding a structured knowledge base with this information. Compared to KBP2010, we extended the Entity Linking task to require clustering of mentions of entities not already in the KB (‘NIL queries’). We also introduced...
متن کامل